Design and Optimization of a Speech Recognition Front-End for Distant-Talking Control of a Music Playback Device

نویسندگان

  • Ramin Pichevar
  • Jason Wung
  • Daniele Giacobello
  • Joshua Atkins
چکیده

This paper addresses the challenging scenario for the distanttalking control of a music playback device, a common portable speaker with four small loudspeakers in close proximity to one microphone. The user controls the device through voice, where the speech-to-music ratio can be as low as −30 dB during music playback. We propose a speech enhancement front-end that relies on known robust methods for echo cancellation, doubletalk detection, and noise suppression, as well as a novel adaptive quasi-binary mask that is well suited for speech recognition. The optimization of the system is then formulated as a large scale nonlinear programming problem where the recognition rate is maximized and the optimal values for the system parameters are found through a genetic algorithm. We validate our methodology by testing over the TIMIT database for different music playback levels and noise types. Finally, we show that the proposed front-end allows a natural interaction with the device for limited-vocabulary voice commands.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech-Recognition Interfaces for Music Information Retrieval: 'Speech Completion' and 'Speech Spotter'

This paper describes music information retrieval (MIR) systems featuring automatic speech recognition. Although various interfaces for MIR have been proposed, speech-recognition interfaces suitable for retrieving musical pieces have not been studied. We propose two different speech-recognition interfaces for MIR, speech completion and speech spotter, and describe two MIR-based hands-free jukebo...

متن کامل

Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN and a cepstral domain denoising autoencoder (DAE)-based dereverberation are presented for distant-talking speaker identification, and a comb...

متن کامل

A Stereophonic Acoustic Front-End for Distant-Talking Interfaces based on Blind Source Separation

In this contribution, an acoustic front-end for distanttalking interfaces that only requires two microphone signals is presented. It comprises a directional blind source separation (BSS)-based noise and interference estimation scheme and Wiener-type filters for noise and interference suppression. The proposed front-end and its integration into a speech recognition system is analyzed and evaluat...

متن کامل

Suitable Design of Adaptive Beamform Spectrum for Noisy Speec

Recognition of distant-talking speech is indispensable for self-moving robots or tele-conference systems. However, background noise and room reverberations seriously degrade the sound capture quality in real acoustic environments. A microphone array is an ideal candidate as an effective method for capturing distant-talking speech. AMNOR (Adaptive Microphone-array for NOise Reduction) was propos...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1405.1379  شماره 

صفحات  -

تاریخ انتشار 2014